16 research outputs found

    The structure of verbal sequences analyzed with unsupervised learning techniques

    Full text link
    Data mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluable tools for theoretical analyses and exploration of the structure of sentences, texts, dialogues, and speech. We report here the results of an attempt at using it for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised training allowing the discovery of the structure of sequential data. The entries of the analyzer were only made of the verbs appearing in the sentences. It provided a classification of the links between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by applying a statistical analysis to independent semantic annotations

    Model-based Co-clustering for High Dimensional Sparse Data

    Get PDF
    Abstract We propose a novel model based on the von Mises-Fisher (vMF) distribution for coclustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa's, a well known difficulty in the classical vMF-based models

    Enchaînements verbaux - étude sur le temps et l'aspect utilisant des techniques d'apprentissage non supervisé

    No full text
    10 pagesNational audienceUnsupervised learning allows the discovery of initially unknown categories. Current techniques make it possible to explore sequences of phenomena whereas one tends to focus on the analysis of isolated phenomena or on the relation between two phenomena. They offer thus invaluable tools for the analysis of sequential data, and in particular, for the discovery of textual structures. We report here the results of a first attempt at using them for inspecting sequences of verbs coming from sentences of French accounts of road accidents. Verbs were encoded as pairs (cat, tense) – where cat is the aspectual category of a verb, and tense its grammatical tense. The analysis, based on an original approach, provided a classification of the links between two successive verbs into four distinct groups (clusters) allowing texts segmentation. We give here an interpretation of these clusters by using statistics on semantic annotations independent of the training process

    Hybrid Unsupervised Learning to Uncover Discourse Structure

    No full text
    volume of the best papers of LTC'07International audienceData mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluable tools for theoretical analyses and exploration of the structure of sentences, texts, dialogues, and speech. We report here the results of an attempt at using it for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised training allowing the discovery of the structure of sequential data. The entries of the analyzer were only made of the verbs appearing in the sentences. It provided a classification of the links between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by comparing the statistical distribution of independent semantic annotations

    Stochastic Co-clustering for Document-Term Data

    No full text
    International audienceCo-clustering is more useful than one-sided clustering when dealing with high dimensional sparse data. We propose to address the aim of document clustering with a generative model-based co-clustering approach. To this end, we rely on a particular mixture of von Mises-Fisher distributions and propose a new parsimonious model allowing to reveal a block diagonal structure as well as a good partitioning of documents and terms. Then, by setting the estimate of the model parameters under the maximum likelihood (ML) approach, we derive three novel co-clustering algorithms: a soft one and two stochastic variants. Empirical results on numerous simulated and real-world datasets, demonstrate the advantages of our approach to model and co-cluster high dimensional sparse data

    An Efficient Incremental Collaborative Filtering System

    No full text
    International audienceCollaborative filtering (CF) systems aim at recommending a set of personalized items for an active user, according to the preferences of other similar users. Many methods have been developed and some, such those based on Similarity and Matrix Factorization (MF) can achieve very good recommendation accuracy, but unfortunately they are computationally prohibitive. Thus, applying such approaches to real-world applications in which available information evolves frequently, is a non-trivial task. To address this problem, we propose a novel efficient incremental CF system, based on a weighted clustering approach. Our system is able to provide a high quality of recommendations with a very low computation cost. Experimental results on several real-world datasets, confirm the efficiency and the effectiveness of our method by demonstrating that it is significantly better than existing incremental CF methods in terms of both scalability and recommendation quality

    Sequencing of verbs - a study on tense and aspect using unsupervised learning

    No full text
    International audienceWe report here the results of an attempt at using data mining tools for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised learning allowing the discovery of the structure of sequential data. The entries of the analyzer were only made for the verbs appearing in the sentences. It provided a classification of the linking between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by applying a statistical analysis to independent semantic annotations
    corecore